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Abstract — The problem of source coding with side information 
(SCSI) is closely related to channel coding. Therefore, existing 
literature focuses on using the most successful channel codes 
namely, LDPC codes, turbo codes, and their variants, to solve this 
problem assuming classical unique decoding of the underlying 
channel code. In this paper, in contrast to classical decoding, we 
have taken the list decoding approach. We show that syndrome 
source coding using list decoding can achieve the theoretical limit. 
We argue that, as opposed to channel coding, the correct sequence 
from the list produced by the list decoder can effectively be 
recovered in case of SCSI, since we are dealing with a virtual 
noisy channel rather than a real noisy channel. Finally, we present 
a guideline for designing constructive SCSI schemes using Reed 
Solomon code, BCH code, and Reed-Muller code, which are the 
known list-decodable codes. 

I. Introduction 

Recently, in the context of sensor network and mobile 
multimedia applications (TJ, |2), 0, distributed source coding 
has gained significant attention from the research community. 
The information theoretic limit for independent encoding of 
correlated sources has been established by Slepian and Wolf 
in (4). According to the Slepian- Wolf theorem, independent 
encoding of correlated sources with joint decoding can be as 
efficient as joint encoding and decoding. More specifically, in 
the compression of two correlated sources {Xi} and {Yi} the 
rates achievable with independent encoding but joint decoding 
are bounded by R x > H(X\Y),R Y > H(Y\X),R x +Ry > 
H(X, Y). In this paper, we focus on the asymmetric approach 
where {Yi} is encoded at a rate H(Y) in the conventional way 
and {Xi} is encoded at a rate H(X\Y) assuming that {Yi} 
is available at the decoder. The asymmetric approach is also 
known as source coding with side information (SCSI) in the 
literature. 

The essential idea of distributed source coding is bin- 
ning 0, 0. Consider the encoding of a source {X{\ in 
the presence of the side information {Yi} available at the 
decoder. Let X and Y are the alphabets of {Xi} and {Y{} 
respectively. For large enough n, with high probability a 
source sequence x 6 X™ belongs to a set of approximately 
2nH(x\Y) se q Uences t}j a t are jointly typical with the side 
information sequence y E Y™. Thus if y were available both 
at the encoder and decoder, the outcomes from the source 
{Xi} could be encoded using approximately H(X\Y) bits on 
average with a very small probability of error. In this case, 



both the encoder and decoder could construct the same set of 
jointly typical sequences and use the same indexing leading to 
correct decoding. However, even if {Yi} is not available at the 
encoder, it is possible to achieve the same rate of H(X\Y). 
The idea is to randomly assign each of the source sequences 
in X" to one of the 2 nR bins, where R > H{X\Y). Given a 
source sequence x, the encoding operation is to transmit the 
index of the bin to which x belongs and the decoding operation 
is to choose the sequence x from the indexed bin which is 
jointly typical with the side information sequence y. Since 
for large enough n with high probability all the sequences 
that are jointly typical with a given y will belong to different 
bins, with high probability x will be equal to x. 

It follows from the above that a practical binning algorithm 
needs to partition the source data space into a minimum 
number of bins, ensuring that for any typical side information 
sequence each of the bins contains only one jointly typical 
source sequence. In other words, in an appropriate measure of 
distance it should put as many source sequences as possible 
in a bin while maximizing the minimum distance between 
any pair of sequences in the bin. Thus each of the bins 
can play the role of a good channel code. This connection 
between binning and channel codes was first indicated in Q 
while interpreting the Slepian- Wolf coding. Due to this close 
connection between binning and channel coding, most of the 
SCSI schemes proposed in the literature are based on the 
most successful channel codes namely LDPC codes, turbo 
codes, and their variants. In these schemes, depending on the 
conditional entropy H(X\Y), a channel code of a particular 
rate needs to be selected. However, there is always a gap 
between the compression rate that can be achieved with a 
specific channel code and the conditional entropy for which 
it can yield near lossless compression. For example, although 
the turbo code-based scheme in 1 8 1 and LDPC -based scheme 
in (9) have been designed for compression rates of 0.67 
and 0.25, they achieve near lossless compression only when 
H(X\Y) = 0.49 and 0.20 bits, respectively. More importantly, 
for a given conditional entropy, there is no guideline for 
choosing the rate of the code to be designed that can ensure 
near lossless recovery. 

In this paper we present a SCSI scheme based on list 
decoding. Although list decoding yields a list of codewords, 
as opposed to the classical unique decoding, we demonstrate 




Fig. 1. An additive noise channel channel can be modelled as a bSC. Here, 
{Ui} is an IID Bernolli process with parameter p and thus the crossover 
probability of the bSC is p. 



that the correct codeword can conveniently be extracted from 
the list in the case of SCSI. The main advantage of using list 
decoding is that it can improve the compression rate signifi- 
cantly as compared to its classical counterpart. Moreover, the 
approach allows for a guideline for the choice of channel rate 
and channel code. 

The organization of the rest of the paper is as follows. 
We review the technique of syndrome source coding, a com- 
pression technique based on channel codes, in Section [II] In 
Section III we present the notion of list decoding and describe 



how this can be used to design a SCSI scheme based on the 
technique of syndrome source coding. The use of existing list- 
decodable codes in the design of practical constructive codes 



for SCSI is detailed in Section IV Finally we conclude the 
paper in Section [V] 

II. Syndrome Source Coding 

The most challenging problem in the design of a practical 
binning scheme is the systematic construction of bins with 
algebraic structures so that the bin indexing and typical set 
decoding can be performed with reasonable complexity. In 
this regard, there is a close connection between binning and 
channel codes. A channel code induces a partitioning of the 
source data space into cosets that can be indexed by their 
respective syndromes ifTUl . If the cosets of a channel code are 
such that each of them with high probability contains only one 
sequence from the typical set then the cosets effectively act 
as bins. In this case the index of the bin can be computed as 
the syndrome of the sequence. 

Consider encoding a memoryless binary symmetric source 
{Xi} with correlated side information {Yi} available only at 
the decoder such that 



(1) 



where {Ui} is an IID Bernoulli source with parameter p < 
1/2. If {Yi} were present at the encoder as well, it could 
encode {X z } at a rate H(X\Y) = H(U) = -plgp - (1 - 
p) lg(l— p). According to the Slepian-Wolf theorem, {Xj} can 
be compressed at the same rate even if {Yi} is present only 
at the decoder. This SCSI scenario can be modelled with an 
additive noise channel where {Xi}, {Yi}, and {Ui} correspond 
to input, output, and noise respectively (see Fig. [TJ. Clearly 
this additive noise channel is equivalent to a binary symmetric 
channel (bSC) with a crossover probability p (see Fig. [TJ. 
This modelling of the correlation between source and side 



information with a virtual channel allows us to use a channel 
code for the bSC to design a SCSI scheme ifTTI as described 
below. 

The capacity Cbsc of the bSC and equivalently of the 
additive noise channel is 



C, 



bSC 



1-H(U), H(U) = -plgp-(l-p)lg(l-p) 



According to the channel coding theorem [12], there exists an 
(n,k) linear block code C of rate k/n — R > {Cbsc—S) such 
that the error probability P e < e for any e > and S > 0. Here 
the error event x ^ x corresponds to the fact that when the 
actual noise vector is u, the decoder decides the noise vector 
to be u and u^u. Therefore, Pr(x ^ x) = Pr(u ^ u). Now 
consider the following scheme of compression of {Xi} with 
the side information {Yi} available only at the decoder based 
on this channel code. The encoding operation is to compute 
the syndrome of a source sequence x € X" as s = Hx T , 
where H is the parity check matrix of C. If C s denotes the 
coset corresponding to the syndrome s, then clearly x G C s . 
The decoding operation is to find the sequence x nearest (in 
Hamming distance) to the side information sequence y e Y™ 
in the coset C s . This is equivalent to finding the minimum- 
weight noise vector u such that x = y+u. Thus the probability 
of error of the scheme is Pr(x ^ x) = Pr(u ^ u), same 
as the channel decoding error, which tends to zero as n — » 
oo. This coding scheme which can compress {Xi} at a rate 
(n — k)/n < H(U) + 5 with an arbitrarily small probability 
of error is known as syndrome source coding ifTJI . Clearly, if 
a channel code of rate R is used for syndrome source coding, 
the achieved compression rate is 1 — R, 

III. Syndrome Source Coding Using List Decoding 

Clearly, the underlying linear block code C and its as- 
sociated decoding algorithm impact the performance of a 
syndrome source coder. Let C be a (n, k) linear block code 
over GF(q). In the decoding of channel codes, the objective 
is to find the transmitted codeword c € C, given the received 
word r e GF(q) n . The natural decoding approach is to find 
the codeword which has the maximum likelihood of being 
transmitted given that r has been received. This approach 
known as maximum likelihood decoding (MLD) amounts to 
finding the codeword c closest to r in an appropriate measure 
of distance. However, MLD is known to be NP-complete in 
general lfT4l . Therefore, bounded distance decoding (BDD) 
which has greatly reduced complexity is preferred in practice 
that ensures correct decoding only when the number of errors 
is upper bounded by some error correcting radius r. Obviously, 
an unambiguous BDD is possible only if r < |_(<^min — 1 )/2j 
where <f m i n is the minimum distance of the code C. 

Let us see the implication of BDD for channel coding 
and in turn for syndrome source coding. For a bSC with 
crossover probability p, the expected Hamming distance be- 
tween the transmitted codeword c and the received word r 
is E[d(c,r)] — np. Thus for unambiguous decoding, we 
need a code C with c? m ; n > 2np. However, the largest rate 
possible for a binary code of d m i n = 2np is upper bounded by 



1— H (2p), see [15 1, which is much smaller than the capacity of 
the channel 1—H(p). This in turn implies that the compression 
rate achievable with syndrome source coding that relies on 
BDD is lower bounded by H(2p), which is much bigger than 
the conditional entropy H(p). 

In the above scenario, the main constraint is the require- 
ment of unique decoding which sets the decoding radius 
to [(rfmin — 1)/2J . One way to circumvent this limitation 
is to increase the decoding radius beyond [(^min — 1 )/2j 
and allow the decoder to output a list of codewords. This 
approach will be feasible as long as (i) the list contains only 
a small number of codewords and (ii) there is an effective 
way of extracting the correct codeword from the list. The 
method of decoding beyond |_(^min — 1) /2j is known as list 
decoding in the literature. Let B g (r, e) denote the Hamming 
sphere of radius e around a point r in the space GF{q) n . 
A code C over GF(q) is said to be (p, L) list-decodable if 
\M q (r, np) n C\ < L. List decoding is considered feasible as 
long as L grows polynomially with the block length n. 

To assess the feasibility of this method, let us look at the 
theoretical limits on list decoding. It has been shown in |[T6l 
that for any integer L > 2, there exists a family of binary 
linear (p, L) list-decodable channel codes of rate R > 1 — 
H(p) — 1/L. Allowing L to grow, a rate arbitrarily close to 
the theoretical limit 1 — H(p) can be achieved. This in turns 
implies that the corresponding syndrome source coders have 
compression rate < H(p) + 1/L, which can be made arbitrarily 
close to the conditional entropy H(p) by allowing L to grow. 

For non-binary alphabet, the capacity of list decoding is 
similar to the binary linear case. It has been shown in ifTBI 
that for any alphabet size of q > 2, list size L > 2, and 
p € (0, 1 — 1/q), there exists a family of (p, L) list-decodable 
q-ary channel codes of rate R > 1 — H q (p) — 1/L. Here 
H q {p) = p\og q (q-l) -p\og q p- (1 -p)log 9 (l -p) is the 
q-ary entropy function. Thus, if the q-ary source {Xi} and the 
side-information {Y;} are correlated in such a way that 

Pi(X i ^Y i )=p, (2) 

then H(X\Y) = H q (p). If a linearity constraint is imposed on 
the channel code, then the best known limit on list-decoding 
capacity [17| turns out to be R > 1 — H q (p) — l/\og q (L + 
1). Consequently, the corresponding q-ary syndrome source 
coders have a compression rate < H q (p) + l/log g (L + 1). 
We see that this requires exponentially bigger list size L than 
its binary counterpart. 

A geometrical interpretation of syndrome source coding 
using list decoding: Consider the correlation model between 
the source and side information as defined in Q. According 
to the law of large numbers, for large enough n, given a 
side information sequence y e GF(q) n , the source sequence 
x e GF(q) n with high probability will be within a thin shell 
on the surface of B 9 (y, np). In fact, the thin shell corresponds 
to the set of sequences that are jointly typical with y. Now the 
total number of points in the shell is approximately equal to 
|B g (y, np)\ since for large n almost all the point in B g (y, np) 
will be in the thin shell. It is known [16] that the number of 



points contained in M q (y, np) is bounded by 

|B,(y,np)| <q nH <<*). ( 3 ) 

Now consider syndrome encoding of x using an (n, k) channel 
code C over GF(q). Clearly C induces a partitioning of the 
source data space GF(q) n into q n ~ k cosets. Since the points 
in the shell are uniformly distributed over GF(q) n , for any 
syndrome s, the number of points in B g (y, np)<~)C s is approx- 
imately q nH i(p) / q n ~ k . From the list decoding point of view, 
this syndrome source coding is feasible if |B g (y, np) n C B \ is 
small, which holds only if R — k/n < (1 — H q (p)). 

Extracting the correct sequence from the list: CRC codes 
are widely used in practice for error detection. A CRC-p 
code is defined by a generator polynomial of degree p 
that assigns p-bit parity to a sequence. In the setting of list- 
decoding based syndrome source coding, the use of a CRC 
code is expected to be effective for at least two reasons. Firstly, 
while in the context of channel coding, the CRC bits are also 
subject to channel noise, this is not the case for syndrome 
source coding. In syndrome source coding, we can assume 
that these CRC bits along with the syndrome will be available 
to the decoder without error. Secondly, since the list size L 
is small (polynomial in n), only a few parity bits should be 
sufficient to correctly identify the desired sequence. A CRC-p 
code is expected to detect the correct codeword from a list of 
size L if p > lg L. Thus a syndrome source coder based on list 
decoding that uses a CRC code to extract the correct sequence 
has a compression rate of H q (p) + l/log g (L + 1) + lg L/n 
which approaches to H q {p) as L and n grow. 

Given a linear (p, L) list-decodable code C over GF(q) 
of rate > 1 — H q (p), let us articulate the encoding and 
decoding operations involved in the syndrome coding of x 
in the presence of a side information y available only at the 
decoder. 

Encoding: Let H be the parity check matrix of the code C. 
Then the syndrome of a source sequence x can be computed 
as s = Hx T . Let <?(£) be the generator polynomial of a CRC- 
p code and x(£) be the polynomial of degree at most n — 1 
that corresponds to the sequence x. Then the p-bit CRC of 
x corresponds to the polynomial h(£) — x(£) mod <?(£) of 
degree at most p — 1. 

Decoding: First we need to list decode C s considering 
the side information y as the received word. For this, given 
any a € C s , the list decoding algorithm for C can be used 
as follows. Compute y' = y a. Using the list decoding 
algorithm for C, determine the list C consisting of those 
codewords of C which are within the Hamming sphere of 
radius np around y'. Then adding a to each of the codewords 
in C, we get the list C s consisting of the words from C s that 
are at a Hamming distance < np from y. Finally, from £ s 
pick the word x such that x(£) mod <?(£) = h(£). 

Remarks: The problem of finding any a € C s amounts to 
solving the system of linear equations Ha T = s. Since there 
are more unknown than equations, there is at least one nonzero 
solution to it. Thus, we can find an a g C s in polynomial time, 
for example, using Gaussian elimination. 



IV. Constructive Code Design 



order n. 



Although the encoding and decoding algorithms presented 
in the previous section are theoretically sound, there are at 
least two challenges while designing codes for real-world 
applications. Firstly, the scheme assumes that d(x, y) w np 
with high probability, which holds when n — > oo. However, 
in the real world we have to operate with finite n. Secondly, 
the scheme also depends on the availability of a (p, L) list- 
decodable code with an efficient encoder and decoder. To date, 
efficient list decoding algorithms are known for the families of 
Reed-Solomon (RS), Bose-Chaudhuri-Hocquenghem (BCH), 
and Reed-Muller (RM) codes. Associated with each of these 
codes C is a known list decoding radius r. Before presenting 
the main results on these families of codes and their potential 
in syndrome source coding, in the following we provide a 
general guideline that can be used to design practical SCSI 
schemes for the correlation model defined in (|2]). 

Block length n: For better performance it is desirable to have 
n as large as possible. However, for large n the computational 
complexity may become problematic. While RS codes in 
practical applications are mostly of length n = 256 (due to 
the byte oriented world), it is feasible to go up to n = 1024 
with binary BCH and RM codes. 

List decoding radius t: In theory, for n approaching oo, a 
list decoding radius of r < np + 5, where S > 0, is sufficient. 
In practice, for fixed n we need to have a list decoding radius 
of r > T e , where T c is such that Pr(d(x, y) > T e ) < e. It can 
be shown that for large n we need to choose T e slightly bigger 
than np. Let r be a random variable representing rf(x, y). 
Then clearly r has a binomial distribution with mean np and 
variance J (np(\ — p)). For large n, with high probability r 
will be in the vicinity of np. For example, for n — 1000, 
p = 0.4, and e = 10~ 4 , the value of T e is 459. If we want 
to decrease the error probability to < 10~ 5 , we will need to 
increase T e only by 9 to 468. 

Code rate R: Since the compression rate achieved with 
syndrome source coding based on a channel code of rate R 
is 1 — R, we need to pick a code of largest rate R whose list 
decoding radius is at least T e . 

CRC code generator <?(£)." There are a number of standard 
CRC codes, see fl8l . Since the list produced by the list 
decoder is guaranteed to be small, a few CRC bits would 
be enough to correctly extract the source sequence from the 
list. In practice, the CRC- 12 is expected to be enough and for 
n = 1000 this would incur only 1.2% of redundancy. 

In the following we discuss the main list decoding results for 
the families of RS, BCH, and RM codes and their implication 
for syndrome source coding. 

RS code: An (n, k) RS code C over GF(q) is defined by 
the following parity check matrix H where 1 < k < n < q, b 
is an integer, and a is an element of GF(q) of multiplicative 
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It follows from the parity check matrix that c G C if and 
only if c(a b+j ) = for all < j < n - k - 1. RS code can 
also be defined with the following generator matrix where, 
ai, oi2, ■ ■ ■ , ctn are distinct nonzero element of GF(q) 



( 1 
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1 \ 
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According to this generator matrix, the codeword corre- 
sponding to a message u can be computed as c = 
11(012), • • • , u(a n )). The family of RS codes are maxi- 
mum distance seperable (MDS) and thus has d min = n—k+1. 

A list decoding algorithm was first discovered for low rate 
RS codes by Sudan lfl9l and later improved and extended 
for all rates by Gurus wami and Sudan EDI . For a RS code 
of rate R, the Guruswami-Sudan algorithm can correct up to 
n(l— vR) errors which is clearly beyond half its the minimum 
distance. Guruswami-Sudan algorithm uses the polynomial 
representation corresponding to G. Given a received word r, 
the essential idea of the algorithm is to find the polynomials 
it(£) of degree at most k such that u(a^) — for at 
least t values of j E [0, n — 1]. Recently, Wu [21 1 has 
proposed an alternative algorithm that can also achieve the 
same list decoding radius but with a reduced complexity. Wu's 
algorithm relies on polynomial representation corresponding to 
H and is akin to Berlekamp-Massey algorithm. 

Example: Let us design a SCSI scheme for the correlation 
model as defined in Q given that q = 2 s , p = 0.3, e = 10~ 4 
and n — 255. For these values of n and e, the value of 
T e turns out to be T e = 105. Thus we need a code having 
r > 105. Using the fact that the RS code of rate R has 
t = n(l - \/R) , we find that the (255, 88) is the desired RS 
code. Thus the compression rate achieved with this scheme 
is 1 - R = 0.6549. When 12 CRC bits are considered 
the compression rate increases to 0.702. In contrast, unique 
decoding would require a code with rf m j n = 2T £ + 1 = 211. 
It is the (255,45) RS code that has d min = 211. The use of 
this code for syndrome source coding with unique decoding 
can only achieve a compression rate of 0.8235. 

Binary BCH code: Binary BCH codes can be interpreted 
as alternate codes of RS codes [10], i.e., if Crs is an RS 
code over GF{2 m ), then C RS nGF(2) n is a BCH code. This 
interpretation allows the Wu's list decoding algorithm for RS 
codes to be used for the list decoding of BCH codes. However, 
Wu [21] has also presented an improved algorithm for list 
decoding of binary BCH that can achieves a list decoding 



radius of r = S(l — y/l — 2D), where D = d m i n /n is the 
designed relative distance of the BCH code. 

Example: Consider designing a SCSI scheme for the 
correlation model as defined in ([!]) for p = 0.2 and e = 10~ 4 . 
As binary BCH codes of length up to 1023 can be implemented 
without any difficulty, we choose n = 1023. For the given 
values of n, p, and e, we calculate T e = 254. The (1023, 56) 
BCH code with D > 0.3743 has r > 382 and thus can achieve 
an error probability of P e < 10 -4 if used for syndrome source 
coding. The compression rate of this scheme is 0.9453 which 
slightly increases to 0.9570 when 12 CRC bits are considered. 
Note that with unique decoding it would need a code of 
dmin > 508. The BCH code of designed distance > 508 is 
the (1023, 11) code which only achieves a compression rate 
of 0.9892. 

RM codes: For any integers m and r with < r < m, 
the r-th order binary RM code is an (n — 2 rn ,k(r, m)) code 
having G? m ; n = 2 m ~ r , where the dimension k(r, m) = 1 + 
(T) + ' ' ' + ('")■ codes can be constructed in many ways. 
Among these, the Boolean function based construction ll22l is 
considered the simplest. 

The best known list decoding algorithm is known to be the 
one by Gopalan, Klivans, and Zuckerman [23 1. The Gopalan- 
Klivans-Zuckerman algorithm relies on the Boolean function 
based construction and achieves a list decoding radius of r = 
5(1 — Vl — 4D), where D is the relative distance of the code. 

Example: Consider designing a SCSI scheme for binary 
correlated sources using RM codes. Let p = 0.3 and e = 10~ 4 . 
To operate in high dimension we choose m = 10 which 
corresponds to RM codes of length n = 1024. For these values 
of n, p, and e, the required list decoding radius turns out to 
be T e = 364. According to the list decoding radius of the 
Gopalan-Klivans-Zuckerman algorithm, we need a RM code 
of d min > 235. The second order RM code is the (1024, 56) 
code having e? m j n = 256 and thus can ensure the desired 
probability of error. The compression rate with this scheme 
is 0.9453. When the CRC bits are included it only increases 
to 0.9570. In contrast, with unique decoding it is not possible 
to achieve any compression for p = 0.3 and e = 10~ 4 , since 
it requires d m i n > 728 which is only possible for r = 0. 

V. CONCLUSION 

We applied channel coding ideas on list decoding to the 
setting of Distributed Source Coding (DSC). In DSC it is 
customary to model the correlation between the source infor- 
mation and the side information via a virtual channel. In this 
paper we recognize the advantage that additional bits can be 
sent outside of the virtual channel. We exploit this advantage to 
accomplish selection of the correct data from the list obtained 
by the list decoder. The additional bits are provided by a CRC 
code. We show that our list decoding-based source coding has 
a compression rate that is significantly higher than a classical 
unique decoding-based source coder. Moreover, the proposed 
approach has the advantage that given the conditional entropy, 
it allows for a clear guideline for choosing the channel rate 
and channel code corrsponding to the desired compression 



rate. Our future work aims at the design of practical codes 
through this approach, in particular the design of efficient 
source decoding methods. 
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